.. _harmonized: Working with harmonized Variables ************************************ This exercise shows you how to work effectively with versioned and harmonized SOEP variables. Please note that the new SOEP versioning and harmonizing concept has only been available since SOEP-Core v34 and only applies to the original SOEP-Core data in long format. **Create an exercise path with four subfolders:** .. figure:: png/uebungspfade.png :align: center **Example:** - H:/material/exercises/do - H:/material/exercises/output - H:/material/exercises/temp - H:/material/exercises/log These are used to store your script, log files, datasets, and temporary datasets. Open an empty do-file and define your paths with globals: .. literalinclude:: docs/Harmonisierung.do :linenos: :lines: 11-19 The global “AVZ” defines the main path. The main paths are subdivided using the globals “MY_IN_PATH”, “MY_DO_FILES”, “MY_LOG_OUT”, “MY_OUT_DATA”, “MY_OUT_TEMP”. The global “MY_IN_PATH” contains the path to your ordered data. **1.) Differences in Response Options** Variables are versioned and harmonized because the response options have changed over time. .. figure:: png/Harmonisierung_01.png :align: center .. figure:: png/Harmonisierung_02.png :align: center The variable plb0038_v1 was obtained from a simple yes/no question between 1992 and 2004. Since 2005, new response options have been added. The individual questionnaires from 2004 and 2005 show these differences. Through the versioning of the variable plb0038, this difference is recognizable to the data user when tabulating the variable. The variable label also shows the beginning and end of the period in which the question was asked differently. .. literalinclude:: docs/Harmonisierung.do :linenos: :lines: 21-23 .. figure:: png/Harmonisierung_03.png :align: center .. figure:: png/Harmonisierung_04.png :align: center The variable plb0038_v1 is recoded during the harmonization process and written into a new variable, plb0038_h, together with plb0038_v2. The harmonized version of the variable should cover the survey period from 1992 to 2014 and should be usable. .. literalinclude:: docs/Harmonisierung.do :linenos: :lines: 38-39 .. figure:: png/Harmonisierung_05.png :align: center .. figure:: png/Harmonisierung_06.png :align: center **2.) Differences in Coding of Response Options** Variables are versioned and harmonized because the coding of the response options has changed over time. Since the values of certain response options can change, the various wave-specific variables cannot be integrated easily into a variable in long format. The variable must be appropriately harmonized to be useable. .. figure:: png/Harmonisierung_07.png :align: center .. figure:: png/Harmonisierung_08.png :align: center From 1994 to 2004, the question about "job change" was asked in the individual questionnaire as a category question with six response options. The order of the response options changed in 2005. .. literalinclude:: docs/Harmonisierung.do :linenos: :lines: 41-42 .. figure:: png/Harmonisierung_09.png :align: center .. figure:: png/Harmonisierung_10.png :align: center In addition to the different order of the response options, the coding order also changed. The data are stored in the wave-specific “raw” datasets with different coding and are contained in the variables plb0284_v1 and plb0284_v2. To use the variable for all survey years, it is necessary to harmonize the different versions. The variable plb0284_v1 is recoded (recode (1=1)(2=2)(3=3)(4=6)(5=4)(6=5)) and then written together with plb0284_v2 as plb0284_h. The new variable plb0284_h is created by the harmonization process. .. literalinclude:: docs/Harmonisierung.do :linenos: :lines: 44-45 .. figure:: png/Harmonisierung_11.png :align: center .. figure:: png/Harmonisierung_12.png :align: center **3.) Content Differences in the Questions.** Variables are versioned when questions were asked differently in different years but the content belongs together. If the content or wording of the question changes, the wave-specific variables cannot easily be integrated into a long variable. .. figure:: png/Harmonisierung_13.png :align: center .. figure:: png/Harmonisierung_14.png :align: center In the 2001 individual questionnaire, respondents were asked whether they had ever received an inheritance. In 2017, this question was worded differently: respondents were asked whether they had received an inheritance in the last 15 years. The questions are similar but cover different time periods. Therefore, the variable is not harmonized but made available as versioned variables. Data users have to decide whether or not to use the variables in the same way. .. literalinclude:: docs/Harmonisierung.do :linenos: :lines: 47-48 .. figure:: png/Harmonisierung_15.png :align: center .. figure:: png/Harmonisierung_16.png :align: center **4.) Change of Question Type.** Variables are versioned and harmonized when questions were asked differently in different years, for example, first as a question with multiple response options and later as a question with a single response option. The possibility to provide multiple answers in certain years makes it difficult to integrate the wave-specific variables into a variable in long format. .. figure:: png/Harmonisierung_17.png :align: center .. figure:: png/Harmonisierung_18.png :align: center When comparing the question on scholarships in the individual questionnaires from 2011 and 2012, it appears that there should be no differences in the variables. Nevertheless, the two questions seem to have been asked differently and stored differently in the raw datasets. This results in several versioned variables. .. literalinclude:: docs/Harmonisierung.do :linenos: :lines: 50-53 .. figure:: png/Harmonisierung_19.png :align: center .. figure:: png/Harmonisierung_20.png :align: center .. figure:: png/Harmonisierung_21.png :align: center .. figure:: png/Harmonisierung_22.png :align: center As you can see, the variable was asked from 2007 to 2011 as a category question with three response options. As a result, respondents could only give one answer. Since 2012, the question has used binary items. It is quite possible that a respondent gave more than one answer. The harmonized version of the variable integrates the binary items from plg0015_v2, plg0015_v3, and plg0015_v4 into the harmonized version plg0015_h. The coding of the variable plg0015_v1 is used as the generation framework. In addition, the harmonization proposal takes into account the problematic multiple answers with the value four. .. literalinclude:: docs/Harmonisierung.do :linenos: :lines: 55-56 .. figure:: png/Harmonisierung_23.png :align: center .. figure:: png/Harmonisierung_24.png :align: center **5.) Euro harmonisation** Variables are versioned and harmonized because they are metric and were asked as DM amounts before the introduction of the euro. For the long version of the variable, metric variables based on different currencies in different years are harmonized as euro amounts. Most of the variables harmonized in the long datasets are amounts of money. Before the introduction of the euro, such information was collected in DM. .. figure:: png/Harmonisierung_25.png :align: center .. figure:: png/Harmonisierung_26.png :align: center Euro harmonisation involves DM amounts being multiplied by the exchange rate so that the harmonized version of the variable represents euro amounts. .. literalinclude:: docs/Harmonisierung.do :linenos: :lines: 58-59 .. figure:: png/Harmonisierung_27.png :align: center .. figure:: png/Harmonisierung_28.png :align: center Last change: |today|